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SCREENING INTERACTOR MOLECULES WITH WHOLE GENOME OLIGONUCLEOTIDE OR POLYNUCLEOTIDE 
ARRAYS 



(57) Abstract 

This invention relates to methods for the identification of nucleic 
acids by direct hybridization to high-density oligonucleotide arrays. The 
methods of this invention comprise the steps of: (a) screening a DNA library, 
such as an S. cerevisiae genomic DNA library, by performing a double 
hybrid screening method with a recombinant vector containing a DNA insert 
encoding a candidate protein of interest and then selecting the clones from 
the DNA library that code for proteins that interact with the candidate protein 
of interest; and (b) hybridizing the DNA inserts contained in the clones that 
have been selected in step (a) using an oligonucleotide probe matrix wherein 
the probe locations on the host genome cover all of the coding sequences, 
determining the hybridization location and consequently, the gene coding 
for a specific protein that interacts with the candidate protein of interest in 
the double hybrid screening system. This invention is also directed to the 
polynucleotides obtained by the methods of this invention, the polypeptides 
encoded by those polynucleotides and the DNA arrays utilized in the methods 
of this invention. 
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SCREENING INTERACTOR MOLECULES WITH WHOLE 
GENOME OLIGONUCLEOTIDE OR POLYNUCLEOTIDE ARRAYS 

BACKGROUND OF THE INVENTION 
5 An estimated 6,000 genes were identified upon the completion of sequencing the 

Saccharomycas ccrcvisiac genome. Fewer than half of these genes have a known biological 
function (1,2). Understanding how these newly sequenced genes function in both defined 
and emerging biochemical pathways is a major challenge for researchers in the post-genome 
era. Efficient functional characterization of these genes requires strategies for scaling genetic 

1 0 analyses to the whole genome level (3). Determination of mRNA gene expression patterns, 
disruption phenotypes, and protein-protein interactions are key questions, which need to be 
addressed for every gene in a genome. 

Plasmid-based library selections are an established approach to the functional analysis 
of uncharacterized genes, and can help elucidate biological function by identifying, for 

15 example, physical interactors for a gene and genetic enhancers and suppressors of mutant 
phenotypes. However, the application of these selections to every gene in a eukaryotic 
genome involves the need to manipulate and sequence hundreds of DNA plasmids. Thus, 
applying traditional methods of functional analysis to every gene in a genome is limited by 
labor and cost. 

20 Because the discovery of thousands of uncharacterized genes by genome sequencing 

projects has increased the need for methods of large scale functional analysis, several 
approaches have been initiated to identify genes that, when disrupted or removed, lead to 
selective growth disadvantages (14-16), A promising complementary approach is the 
application of established genetic screens to every gene in an organism in an attempt to 

25 assign a biological function to every open reading frame. Genome-wide analyses based on 
two-hybrid screens, enhanced synthetic lethal screens, and screens for signal peptide 
sequences have been proposed (17-19). 

The two hybrid assay exploits the ability of a pair of interacting proteins to bring a 
transcription activation domain into close proximity with a DNA-binding site that regulates 

30 the expression of an adjacent reporter gene. The assay employs chimeric genes which express 
two types of hybrid proteins. The second hybrid contains the DNA binding domain of a 
transcriptional activator fused to a second test protein. The first hybrid protein contains a 
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transcriptional activation domain fused to a first test protein. If the two test proteins are able 
to interact, they bring the two domains of the transcriptional activator into close proximity 
sufficient to cause transcription, which can then be detected by the activity of a marker gene 
that contains a binding site for the DNA-binding domain. 

The two-hybrid assay can be used to test a multiplicity of proteins simultaneously to 
determine whether they interact with a known protein. For example, a DNA fragment 
encoding the DNA-binding domain may be fused to a DNA fragment encoding the known 
protein in order to provide one hybrid. This hybrid is introduced into the cells carrying a 
marker gene. For the first hybrid, a library of plasmids can be constructed which may include, 
for example, total mammalian cDNA fused to the DNA sequence encoding the activation 
domain. This library is introduced into the cells carrying the second hybrid. If any individual 
plasmid from the library encodes a protein that is capable of interacting with the known 
protein, a positive signal will be obtained. However, because repetitive dideoxy sequencing 
is required to exhaustively identify the results of a screen, application of these methods to 
tens of thousands of genes is also limited by time, labor, and expense. 

Two-hybrid screens for protein-protein interactions provide a genetic tool that can 
be applied, in principle, to every gene in a genome. The Escherichia coli bacteriophage T7 
genome has already been characterized with exhaustive two-hybrid screening and sequencing 
for each known gene. Even with the use of novel strategies for highly efficient two-hybrid 
screening, however, an analysis of all genes encoded in the human genome would require 
sequencing of approximately 1 x 10<5 sequence fragments. As an alternative, genes may be 
individually cloned into two-hybrid vectors and tested in a pairwise manner. One 
disadvantage of this approach is that testing only the full length form of a gene might fail to 
identify those interactions that occur only with isolated domains of a protein (20). Functional 
selections that need to be performed in mammalian cells would also benefit from more highly 
parallel analysis. For example, it is conceivable to select for human genes that yield 
phenotypes, such as increased drug or pathogen resistance, when overexpressed in cell lines. 
The use of array hybridization to analyze results from these screens would eliminate the need 
to maintain large numbers of individual clones in tissue culture until they can be sequenced. 
Thus, the present invention overcomes the problems associated with the prior art through 
the use of DNA arrays or matrices, permitting highly parallel identification of the sequence 
and orientation of nucleic acid elements in a pool. 
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SUMMARY OF THE INVENTION 

The methods of this invention comprise the steps of: (a) screening a DNA library, 
such as an S. cerevisiae genomic DNA library, by performing a double hybrid method with 
a recombinant vector containing a DNA insert encoding a candidate protein of interest and 
then selecting the clones from the DNA library that code for proteins that interact with the 
candidate protein of interest; and (b) hybridizing the DNA inserts contained in the clones that 
have been selected in step (a) using an oligonucleotide probe matrix, wherein the probe 
locations on the host genome cover all of the coding sequences, determining the 
hybridization location and consequently, the gene coding for a specific protein that interacts 
with the candidate protein of interest in the double hybrid screening system. Thus, the 
methods of this invention allow screening at a very large scale for DNA sequences having 
functional utility and avoid the systematic sequencing of the DNA inserts of interest required 
by prior art methods. 

This invention is also directed to the polynucleotides obtained by the methods of this 
invention and the polypeptides encoded by those polynucleotides. In addition, the invention 
is directed to the DNA arrays or matrices utilized in the methods of this invention. 

Oligonucleotide arrays can be synthesized for any organism for which complete or 
partial sequence information is available. The time required to analyze the results of a genetic 
selection can be drastically reduced, making it feasible to apply conventional screens to very 
large numbers of genes in a mammalian genome. Analysis of screens by array hybridization 
is adaptable to any genome-wide functional selection or experiment where the output is a set 
of nucleic acid sequences. 

For example, DNA arrays containing oligonucleotides complementary to every gene 
in the Saccharomyces cerevisiae genome can be used to analyze the results from plasmid 
based genetic screens in a single experiment. Based on the recently completed sequence of 
Saccharomyces cerevisiae, the first high density arrays containing oligonucleotides 
complementary to every gene in the yeast genome have been designed and synthesized. 
Two-hybrid protein-protein interaction screens were carried out for Saccharomyces 
cerevisiae genes implicated in mRNA splicing and microtubule assembly. Hybridization of 
labeled DNA derived from positive clones is sufficient to characterize the results of a screen 
in a single experiment allowing rapid detection of both established and novel biological 
interactions. These results demonstrate the use of oligonucleotide arrays for the analysis of 
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two-hybrid screens. This approach is generally applicable to the analysis of a range of genetic 
selections with outputs of high complexity. 
BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 represents a method for identifying sequences following a genetic selection. 
5 Rather than individual purification and didcoxyscquencing, all clones arc pooled from plates, 
and plasmid DNA is isolated in a single purification. PCR amplification using primers with 
3' sequence corresponding to the vector sequence is used to selectively enrich for insert DNA 
from the plasmid pool. Amplified insert DNA is fragmented with DNAse I, labeled with 
biotin-ddATP, and hybridized to an array containing oligonucleotide probes for every gene 
1 0 in the yeast genome. 

Figures 2a and 2b depict fluorescence images of a high-density oligonucleotide 
array containing 25-mer probes for nearly every gene on Saccharomyces cerevisiae 
chromosomes 5 through 10. Fig. 2a depicts the fluorescence pattern obtained following 
hybridization of 1 1 control genes: YEL002c, YEL003w, YEL005c, YEL006w, YEL018w, 
15 YEL019C, YEL021W, YEL024w, YHL014c, YHL045w, and YHL044c. Dark areas 
correspond to probes for genes not present in the control pool. Fig. 2b provides a close-up 
view of gene YHL014c, which show the exact probe features that hybridize to the insert. 
Red grid highlights all probe features for. YHL014c. The top row of probe elements contain 
oligonucleotides perfectly complementary to gene sequence, while bottom rows contain a 
mismatch in the central position of the oligonucleotide. Approximate locations of 
complementary oligonucleotide probes along the YHL014c ORF are also shown. 

Figure 3 depicts a fluorescence image of a portion of a high-density oligonucleotide 
array containing 25-mer probes to nearly every gene on Saccharomyces cerevisiae 
chromosomes 5 through 10 following hybridization of YMR1 17c two-hybrid sample. The 
25 three lighted strips correspond to probes covering nucleotides 1 56-654 of ORF YER01 8c, 
nucleotides 1860-2484 of YER032w, and nucleotides 4092-4452 of YGL197w. Terminal 
probes are described as the most 5' nucleotide of the most 5' probe and the most 3' 
nucleotide of the most 3 1 probes that gave a positive signal. Dark areas correspond to probes 
for genes not present following genetic selection. 
30 DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides methods for screening polynucleotides, such as 
polynucleotides contained in the genome or in a cDNA obtained from the mRNA of a given 
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prokaryotic or eukaryotic host or in a DNA insert of a random peptide DNA library. In 
essence, the methods of this invention comprise the steps of: a) subjecting the polynucleotide 
of interest to a two-hybrid screening method; and b) subjecting the polynucleotides selected 
at step a) to a hybridization reaction onto a matrix substrate onto which oligonucleotide or 
polynucleotide probes have been immobilized (i.e., DNA array). 

Any two-hybrid screening method may be used to complete step a) of the methods 
of this invention. For example, the yeast two hybrid system developed by Fields and 
coworkers (21) utilizes hybrid genes to detect protein-protein interactions by means of direct 
activation of a reporter-gene expression. U.S. Patents Nos. 5,283,173 and 5,468,614 
describing this technique are relied upon and incorporated by reference. Mammalian two 
hybrid systems using 0-galactosidase complementation to monitor protein-protein 
interactions in intact eukaryotic cells (22, 23), phage display (24) and double tagging assays 
(25) represent alternative two-hybrid assay approaches to screen complex libraries of 
proteins for direct interaction with a given ligand. In addition, reverse two hybrid screening 
procedures, such as those described by White (26) and Vidal et al. (27, 28) can be utilized 
in the methods of this invention. Most preferably, the two-hybrid system utilized in the 
methods of this invention is that described by Daniel Ladant et al. in U.S. provisional patent 
application No. 60067308 entitled A BACTERIAL MULTI-HYBRID SYSTEM AND 
APPLICATIONS THEREOF, filed December 4, 1997, the entire disclosure of which is 
relied upon and incorporated herein by reference. 

The preparation and use of high density DNA arrays has been described in 
International patent applications WO 97/29212, WO 97/27317, WO 97/10365, and WO 
92/10588, the disclosures of which are relied upon and incorporated herein by reference. 
See also, Wodicka, L. et al. (1997) Nature Biotechnology. 15, 1359-1367. 

One embodiment of this invention (designated "Method 1" for convenience) provides 
a method for selecting a polynucleotide encoding a first polypeptide that is able to interact 
with a second polypeptide of interest. Specifically, this method comprises the following 
steps: 

a) providing a recombinant host cell containing a detectable gene, wherein the 
detectable gene expresses a detectable polypeptide when the detectable gene is activated by 
an amino acid sequence including a transcriptional activation domain, such as the 
transcription activation domain of the GAM protein; 
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b) providing a first chimeric gene that is capable of being expressed in the host ceil, 
the first chimeric gene comprising a DNA sequence that encodes a first hybrid polypeptide 
encoded by a given prokaryotic or eukaryotic organism, said first hybrid polypeptide 
comprising: 

5 (i) the transcriptional activation domain; and 

(ii) a first test polypeptide that is to be tested for interaction with the second 
test polypeptide; 

c) providing a second chimeric gene that is capable of being expressed in the host 
cell, the second chimeric gene comprising a DNA sequence that encodes a second hybrid 

10 polypeptide, the second hybrid polypeptide comprising: 

(i) a DNA-binding domain, e.g., the DNA binding domain of the GAL4 
protein, that recognizes a binding site on the detectable gene in the host 
cell; and 

(ii) a second test polypeptide that is to be tested for interaction with at least 
1 5 one first test polypeptide; 

wherein interaction between the first test polypeptide and the second test polypeptide 
in the host cell causes the transcriptional activation domain to activate transcription of the 
detectable gene; 

d) introducing the first chimeric gene and the second chimeric gene into the host 

20 ceil; 

e) subjecting the host cell to conditions under which the first hybrid polypeptide and 
the second hybrid polypeptide are expressed in sufficient quantity for the detectable gene to 
be activated; 

0 selecting the host cell clones for which the detectable gene has been expressed to 
25 a degree greater than expression in the absence of interaction between the first test 
polypeptide and the second test polypeptide; 

g) optionally pooling the clones that have been positively selected at step 0 

h) amplifying the polynucleotides of interest contained in the clones of step f) or g) 
with a pair of oligonucleotide primers respectively hybridizing with a plasmid sequence 

30 located at the 5* end of the polynucleotide of interest and with a sequence complementary 
to a plasmid sequence located at the 3' end of the polynucleotide of interest coding for the 
first polypeptide; 
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i) hybridizing the amplified polynucleotides obtained at step h) to a matrix substrate 
on which has been bound, at known locations, a plurality of sets of oligonucleotide or 
polynucleotide probes of predetermined sequence, each bound set of oligonucleotide or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism from which the polynucleotide coding for the first test polypeptide 
belongs; 

j) detecting the locations of the polynucleotide hybrid complexes obtained at step 
i) on the matrix substrate; and 

k) optionally determining the quantity of each hybrid complex detected at step j). 

Most preferably, the second chimeric gene is provided to the recombinant cell host 
before the introduction of the first chimeric gene. 

An alternate embodiment of the invention (designated "Method 2" for convenience) 
provides a method for selecting a polynucleotide encoding a first polypeptide that inhibits 
the interaction between a second polypeptide and a third polypeptide. Specifically, this 
method comprises the following steps: 

a) providing a recombinant host cell containing a detectable gene, wherein the 
detectable gene expresses a detectable polypeptide when the detectable gene is activated by 
an amino acid sequence including a transcriptional activation domain, e.g., GAL4; 

b) providing a first gene that is capable of being expressed in the host cell, said first 
gene comprising a DNA sequence that encodes a first polypeptide encoded by a given 
prokaryotic or eukaryotic organism, and for which its inhibition property on the interaction 
between a second and a third polypeptide is tested; 

c) providing a second chimeric gene that is capable of being expressed in host cell, 
the second chimeric gene comprising a DNA sequence that encodes a second hybrid 
polypeptide encoded by a given prokaryotic or eukaryotic organism, said second hybrid 
polypeptide comprising: 

(i) the transcriptional activation domain; and 

(ii) a second test polypeptide that interacts with a third polypeptide; 

d) providing a third chimeric gene that is capable of being expressed in the host cell, 
the third chimeric gene comprising a DNA sequence that encodes a third hybrid polypeptide, 
the third hybrid polypeptide comprising: 
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(i) a DNA-binding domain, such as GAL4, that recognizes a binding site on 
the detectable gene in the host cell; and 

(ii) a third test polypeptide that interacts with the second test polypeptide; 
wherein interaction between the second test polypeptide and the third test 

5 polypeptide in the host cell causes the transcriptional activation domain to activate 
transcription of the detectable gene; 

e) introducing the first gene, the second chimeric gene, and the third chimeric gene 
into the host cell; 

f) subjecting the host cell to conditions under which the second hybrid polypeptide 
1 0 and the third polypeptide are expressed in sufficient quantity for the detectable gene to be 

activated; 

g) selecting the host cell clones for which the detectable gene has been expressed to 
a degree lesser than its expression level in the absence of expression of the first polypeptide; 

h) optionally pooling the clones that have been positively selected at step g); 

1 5 i) amplifying the polynucleotides of interest contained in the clones of step g) or h) 

with a pair of oligonucleotide primers respectively hybridizing with a plasmid sequence 
located at the 5' end of the polynucleotide of interest and with a sequence complementary 
to a plasmid sequence located at the 3* end of the polynucleotide of interest coding for the 
first polypeptide; 

20 j) hybridizing the amplified polynucleotides obtained at step i) to a matrix substrate 

on which has been bound, at known locations, a plurality of sets of oligonucleotide or 
polynucleotide probes of predetermined sequence, each bound set of oligonucleotide or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism from which the polynucleotide coding for the first test polypeptide 

25 belongs; 

k) detecting the locations of the polynucleotide hybrid complexes obtained at step 
j) on the matrix substrate; 

I) optionally determining the quantity of each hybrid complex detected at step i). 

Most preferably, the second and the third chimeric genes are provided to the 
30 recombinant cell host before the introduction of the first chimeric gene. 

In Method 2 of the present invention, the first chimeric gene is preferably expressed 
under the control of an inducible promoter. Thus, the recombinant cell host that has been 
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transformed with the three chimeric genes first expresses constitutively the second and the 
third chimeric gene in order to allow the interaction of the resulting second and third fusion 
polypeptides to take place. Then the expression of the first chimeric gene is induced using 
the appropriate inducing signal, such as the addition of an inducer molecule in the culture 
medium. For example, the inducible promoter Met 3E (inducible by the amino acid 
methionine) (29) may be used to control the expression of the first chimeric gene. 

For the purpose of describing this invention, a gene or a chimeric gene means a 
polynucleotide that encodes a polypeptide or a fusion polypeptide respectively, wherein the 
polynucleotide may or may not additionally include a polynucleotide sequence that drives its 
expression at the transcriptional or translational level. 

In a preferred embodiment of the methods of this invention, some of the 
polynucleotides obtained at step 0 or g) of Method 1 or step g) or h) of Method 2 are 
(simultaneously with completion of the remaining steps in each method with the remaining 
polynucleotides) subjected to a DNA amplification reaction with a pair of primers, wherein 
1 5 at least one of the primers comprises, at its 5' end, a promoter region recognized by a specific 
RNA polymerase (e.g., the bacteriophage T7 promotor region) and then incubated in the 
presence of the corresponding RNA polymerase, such as the bacteriophage 17 polymerase, 
in an acellular enzyme medium. The mRNA is then further incubated in the presence of a 
reverse transcriptase type enzyme and the resulting cDNA molecule is hybridized to a matrix 
substrate on which has been bound, at known locations, a plurality of sets of oligonucleotides 
or polynucleotides of predetermined sequence, each bound set of oligonucleotides being able 
to hybridize with a specific polynucleotide carried by the genome of the organism from which 
the polynucleotide coding for the first test polypeptide belongs. The polynucleotide hybrid 
complexes obtained on the matrix substrate are then detected and compared with the results 
25 obtained from the matrix of Method 1 or Method 2. 

It will be noted in the practice of the methods of this invention, that the 
polynucleotide inserts of the DNA library used to make the two-hybrid screening step may 
begin with a nucleotide which is not in phase with the transcriptional activation domain 
coding sequences. Despite the open reading frame shift occurring at the 5' end of the 
30 polynucleotide sequence, it has been observed that a correct polypeptide is synthesized, due 
to a probable jump of the ribosome, placing the ribosome back in the correct reading frame. 
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Consequently, a shift in the reading frame at the beginning of the coding sequence of interest 
does not prevent the synthesis of the correct polypeptide interactor. 

In a most preferred embodiment of the methods according to this invention, the 
selected polynucleotides encoding the first polypeptide arc labeled before performing the 
5 hybridization step, either during or after the PCR. amplification step. The polynucleotide may 
be labeled with a radioactive element ("P, ,5 S, »H. m l) or by a non-isotopic molecule (for 
example, biotin, acetylaminofluorene, digoxigenin, 5-bromodesoxyuridin, fluorescein). 
Examples of non-radioactive labeling of nucleic acid fragments are described in French 
Patent No. 78 10975 or Uredea, or Sanchez-Pescador et al. (30, 31). One of skill in the art 
1 0 will appreciate that other labeling techniques may also be used, such as those described in 
French Patents Nos. 2 422 956 and 2 528 755 or in Matthews et al. (32). 

One of the most important features of the hybridized DNA arrays or matrices utilized 
in the screening methods of this invention is that the DNA arrays allow, in a one step 
method, mapping of all the potential polypeptides interacting with a given defined 
polypeptide in a forward two-hybrid method, or inhibiting the interaction between two 
defined polypeptides in a reverse two hybrid method. Thus, the hybridization pattern of 
oligo- or polynucleotides coding for the interactor polypeptides identify the whole set of 
polypeptides of interest. In contrast, the prior art technique of systematic sequencing of 
every selected polynucleotide identified only individual interactor coding sequences and did 
20 not provide any understanding of the global interaction possibilities. 

Preferably, the oligonucleotide or polynucleotide probes, bound to the substrate 
matrix in the methods of this invention are designed in such a manner that every region of 
the whole genome of the prokaryotic or eukaryotic host organism is able to specifically 
hybridize to at least one set of the oligonucleotide or polynucleotide probes. It is also 
25 preferred that sets of oligonucleotide or polynucleotide probes bound to the matrix substrate 
are complementary to adjacent sequences in the genome of the prokaryotic or eukaryotic 
host, such that the distance between the sequences is less than one kilobase, preferably less 
than 500 nucleotides and most preferably about 50 nucleotides. 

It will also be apparent that the matrices obtained from the methods of this invention 
30 are valuable products themselves. Of particular interest is a matrix substrate comprising a 
plurality of immobilized sets of oligonucleotide or polynucleotide probes of predetermined 
sequence, each bound set of oligonucleotide or polynucleotide probes being able to hybridize 
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with a specific polynucleotide carried by the genome of the organism from which the 
polynucleotide coding for the first test polypeptide belongs; and at least one polynucleotide 
coding for one selected first test polypeptide being hybridized thereto. 

The DNA arrays used in the methods of the invention preferably contain 
oligonucleotide probes of between 10 and 100 nucleotides, and preferably between 10 and 
40 nucleotides, and cover the whole genome or part of the genome of interest. In one 
embodiment of the invention, the oligonucleotide probes immobilized onto the substrate 
matrix consist of Expressed Sequence Tags (ESTs). The DNA arrays of this invention may, 
alternatively, contain full length coding polynucleotides corresponding to every identified 
gene of the host organism under study. For example, when S. cercvisea is the target host, 
a typical DNA array used in performing the screening methods of the invention may contain 
6000 full- length polynucleotides, each polynucleotide comprising the full length coding 
sequence of a gene among the 6000 genes identified for S. cerevisiae. 

Because the screening methods according to this invention make use of DNA probe 
arrays in order to identify the selected polynucleotides coding for the interactor polypeptides 
of interest, the methods are particularly well suited to polynucleotides derived from a host 
organism for which the whole genome has already been sequenced. However, the methods 
of this invention may also be applied to polynucleotides issued from a library generated from 
specific partially or totally sequenced chromosomes of complex host organisms, including 
humans. In one specific embodiment of the methods of this invention, the method is 
performed using, as a source of polynucleotide sequences to be tested, a library of randomly 
synthesized and identified polynucleotides. 

It will be readily apparent to those of skill in the art that application of the methods 
of this invention will lead to the identification of novel polynucleotides and their functions. 
These polynucleotides and the polypeptides encoded by these polynucleotides are within the 
scope of this invention. Of particular interest are peptides comprising a peptide domain that 
interacts with the second test polypeptide of interest. 

EXAMPLES 

Preparation of oligonucleotide arrays 

Oligonucleotide arrays containing over 65,000 DNA synthesis features were 
prepared using light-directed, solid phase combinatorial chemistry as previously described 
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(6, 7). Each 50 x 50 urn synthesis feature is comprised of more than 10 7 copies of a discrete 
25-mer oligonucleotide that is complementary to a portion of a yeast gene. The full set of 
oligonucleotides includes an average of twenty synthesis features for each of the 6,32 1 genes 
identified from the Saccharomyces cerevisiae genome. These arrays were originally designed 
5 and used for the analysis of mRNA gene expression (Wodicka, L., Dong, H., Mittmann, M, 
Ho, M. H., and Lockhart, D. J., Nat. Biotechnology, 1 197, 15, 1359-1367). 

Oligonucleotide arrays were first tested for the ability to identify specific gene 
fragments. A fluorescence image of an array following hybridization of eleven labeled PCR 
products reveals intense signals at discrete positions, with minimal background (Fig. 2a). 
1 0 Because the probes for a given gene are synthesized in adjacent positions, hybridization of 
PCR products is detected as horizontal rows of high intensity (Fig. 2b). Signal corresponding 
to all eleven genes was detected in the correct locations. No significant signal was detected 
for any other genes in the genome. Each experiment was performed in duplicate, and 
hybridization results were found to be reproducible (data not shown). 
1 5 After a biological selection, library elements in high abundance can be identified by 

dideoxy sequencing. However, detection of rare elements might require the sequencing of 
thousands of clones. To determine the ability to detect very rare elements using array 
hybridization, the control PCR products were remade without the 600 bp YEL006c gene 
fragment, and known amounts of this sequence were added to the pool. Concentrations of 
20 spiked YEL006c DNA as low as 5 pM were detectable by hybridization. Therefore, array 
hybridization is sensitive to library elements that comprise less than 1:10,000 of the total 
pool. This is consistent with previous gene expression experiments in which rare mRNAs 
present at frequencies below 1 : 1 00,000 were detected quantitatively (7). 

Whole genome yeast arrays were then used to analyze DNA results from two-hybrid 
25 screens for protein-protein interactions. Identification of proteins that physically interact 
within the cell can suggest how a gene product participates in cellular processes (8-1 1). In 
the two-hybrid screen, two proteins are expressed in yeast as fusions to either the DNA- 
binding domain or the activation domain of a transcription factor. Physical interaction of the 
two proteins reconstitutes transcriptional activity, turning on a chromosomal gene essential 
30 for survival under selective conditions (8). In screening for novel protein-protein interactions, 
yeast cells are first transformed with a plasmid encoding a specific DNA-binding fusion 
protein. A plasmid library of activation domain fusions derived from genomic DNA is then 
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introduced into these cells. Transcriptional activation fusions found in cells which survive 
selective conditions are considered to encode peptide domains which may interact with the 
DNA-binding domain fusion protein. 
Library construction 

A large yeast genomic DNA library of 5 x I0 6 clones (designated the «FRYL» 
library) was made in E. coti MR32 strain according to a previously described procedure 
[Elledge et al. PNAS. USA, 88, 1731-1735 (1991)]. 

- Origin of the plasmid: pACTII (with minor modifications). 

- Origin of the genomic DNA; Ym955 (a gift of M Johnston). 

Ym955 = ura3-52, his3-200, ade2-101, lys2-801, leu2-3,112, trpl-901, tyrl- 
501,gal4-542, ga!80-538, 

his3-200, trpl-901, gal4-542 and ga!80-538 are deletions of all coding 
sequences. 

Genomic DNA was sonicated, blunted by 3 modification enzymes (Mung bean, T4 
DNA Polymerase and Kleenow). Adaptors were ligated to blunted ends. Adaptors were 
designed to allow blunt litigation at one extremity and cohesive ligation with a 3 nucleotide 
overhang at the other end. 

The sequence of adaptors was S'-ATCCCGGACGAAGGCC (SEQ ID NO: 1) and 
5'-GGCCTTCGTCCGG (SEQ ID NO: 2), and only the former was phosphorylated before 
annealing to avoid self-ligation of the adaptors. After ligation the inserts were purified from 
free adaptors and small fragments on a Chroma Spin column (Clontech). 

The pACTII vector was digested with BamHl and the extremities were filled in with 
dGTP by the Vent (exo) polymerase (New England Biolabs), generating extremities 
complementary to the 3 nucleotide overhang of adaptors but preventing self-ligation of the 
vector. (BamHl sites are reconstituted at each end of the insert). This strategy prevents self- 
ligation of the vector or ligation of multiple inserts. 

Inserts and vectors were ligated together and ligation products were used to 
transform E coli MR32. 5 x 10* clones were obtained. All transformants were scraped from 
dishes and the pool of transformants were frozen in LB/glycerol. The titer of the library was 
1-2 x 10 9 transformants/ml. 
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EXAMPLE 1 

To demonstrate the analysis of a genetic selection using oligonucleotide arrays, a 
two-hybrid screen was conducted for the Saccharomyces cercvisiaa gene YMR117c. 
YMRl 17c is a previously uncharactcrized ORF recently found by two-hybrid analysis to 
5 interact with the U2 snRNP-associatcd splicing factor, Prp 1 1 p (4). 
Plnsmids mid strains 

For the YMRl 17c screen, the yeast strains used for two-hybrid screening were 
CGI 945 and Y 187 (Clontech). A pAS2AA bait vector was constructed from the pAS2 
plasmid (Clontech) by deletion of the CYH2 gene and the HA epitope. A bait plasmid was 
10 constructed by PCR amplification of YMRl 17c from genomic DNA and cloning into 
pAS2AA as a BamHI-Pst fragment. The bait plasmid was verified by sequencing after 
cloning. 

The polynucleotide insert containing the chimeric gene GAL4/YMR1 17c consists of 
SEQ ID NO: 3, wherein nucleotides 1-441 correspond to the GAL4 DNA binding domain. 

1 5 The resulting encoded fusion polypeptide consists of SEQ ID NO: 4, wherein amino acids 
1-147 correspond to the GAL4 DNA binding domain and amino acids 148-378 correspond 
to the YMRl 17c peptide sequence. 
YMRl 17c Two-hybrid screen 

CGI 945 yeast cells were transformed with the bait vector and used in a mating 

20 strategy (4). Yl 87 ceils were first transformed with DNA from the FRYL two-hybrid library, 
transformants were pooled, and aliquots of the cell suspension were frozen. The two strains 
were mixed, concentrated onto filters, and incubated on rich medium for 4.5 h at 30 °C. The 
cells were collected, and a 10" 3 dilution was spread on -L, -LW, and -W plates to score the 
number of parental cells and the number of diploids. The rest of the cell suspension was 

25 spread on -LWH plates and incubated for three days at 30 °C. 8.5 x 10 7 diploids were 
screened, and 5800 His* colonies were selected. 10 ml of an X-Gal mixture (0.5 % agar, 
0.1 % SDS, 6 % dimethylformamide, and 0.04 % X-Gal) was poured on the plates and the 
plates were incubated at 30 °C. Blue clones were checked after a 30 min to 1 8 h incubation 
and streaked on -LWH selective plates. 108 total clones were identified as positive by the 

30 X-Gal assay and processed as described below. 
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PCR amplification and labeling of DNA from pooled clones 

A volume of 200 ul of a saturated culture (approximately I x 10 7 cells) of each of the 
108 positive two-hybrid clones from the YMR1 1 7c two-hybrid screen were pooled (Fig. 1) 
and DNA was isolated and purified as previously described (5). Primers containing vector 
5 sequence at the 3' end were used to PCR amplify gene inserts from the plasmid mixture. 
Specifically, using the vector-based primers T7FOR (5'GAATTGTAATACGA 
CTCACTATAGGGAGGTGATGAAG ATACCCCACC-3') (SEQ ED NO: 5) and T3REV 

(AGATGCAATTAACCCTCACTAAAGGGAGACGGGGTTTTTCAGTATCTAC 
GATTC-3') (SEQ ID NO: 6), all library inserts were PCR amplified in a single reaction. The 

10 50 |il PCR reaction contained: 2.5 U of Taq DNA polymerase, 10 mM Tris (pH 8.5), 50 
mM KCI, 1.5 mM MgCI 2 , 0.2 uM each primer, and 250 uM each dNTP. Conditions used 
for amplification were as follows: 30 cycles at 96°C for 30 s, 62°C for 30 s, 72°C for 2 min. 
Reaction products were purified in a Qiaquick spin column (Qiagen). 1 ug total PCR product 
was fragmented with 0. 1 U DNAse I (amplification grade, GibcoBRL) for 2 min in 35 ul 

15 containing: 10 mM Tris-acetate (pH 7.5), 10 mM magnesium acetate, 50 mM potassium 
acetate, and 15 mM CoCl. The DNAse I reaction was then boiled for 15 min, chilled on ice, 
and incubated with 1 mmole biotin-ddATP (NEN) and 25 U terminal transferase (Boehringer 
Mannheim) for 1 hour at 37°C. SSPE-T hybridization buffer (0.9 M NaCI, 60 mM NaH 2 PO<, 
6 mM EDTA, 0.005 % Triton-X-100) was added to a final volume of 200 ul. 

20 Generation of cDNA product from PCR product 

RNA was transcribed from 240 ng of purified PCR product using T7 polymerase 
(Ambion). The reaction was incubated an additional hour with 20 U DNAse I. RNA was 
purified using an RNA spin column (Qiagen). 2.0 ug of RNA was used for first strand cDNA 
synthesis (Promega). Reaction products were purified in a Qiaquick spin column (Qiagen), 

25 and 1 ug total PCR product was digested and prepared for hybridization as described above. 
Hybridization of DNA to the high-density oligonucleotide array 

DNA products generated from the library plasmid pool were partially DNAse I 
digested, biotinylated, and hybridized to whole genome arrays (Fig. 3). Specifically, arrays 
were prewashed with hybridization buffer (described above) 5 min prior to sample 

3 0 hybridization. Following a 5 min incubation at 99°C, the sample was chilled on ice, allowed 
to return to room temperature, and applied to the array. After a 12 hour hybridization at 
42°C, the array was washed 10 times with 6X SSPE-T, washed with 0.5x SSPE-T for 15 
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min, and stained with a streptavidin-phycoerythrin conjugate (Molecular Probes) for 10 min, 
all at 42°C. The staining buffer contained 6X SSPET, 0.5 mg/ml bovine serum albumin, and 
1 mg/ml streptavidin-phycoerythrin. The array was washed 5 times with 6X SSPE-T prior 
to scanning. Hybridization patterns were detected by using an argon ion laser to excite 
phycoerythrin; the resulting emission was detected using a photomultiplicr tube through a 
560 nm bandpass filter (Molecular Dynamics). The entire array was read at a resolution of 
7.5 u,m in less than 20 min, generating quantitative signal for each probe element. The 
collected data was analyzed using image and data analysis software (Affymetrix). 

Orientation of genes was determined by hybridization of biotinylated cDNA products. 
All genes identified by array hybridization are listed in Table 1. 
Criteria for gene detection 

On chips A, B, C, and D, which contain an average of 20 probes per gene, the 
presence of a gene fragment was determined by visual and quantitative detection of three 
contiguous positive probes. On the E chip, which contains probes for 5' sequence from genes 
which are longer than lkb, detection of two contiguous positive probes was considered 
sufficient to detect a gene fragment. 
Comparison of hybridization and sequencing results 

Library plasmid inserts were amplified by PCR and the insert junctions with the 
GAL4 domain were sequenced and precisely identified in the yeast genome using the BLAST 
program, the Saccharomyces Genome Database, and the Yeast Protein Database. In parallel, 
clones were used to inoculate 200 uJ cultures. Saturated cultures were pooled and processed 
as previously described. 

The hybridization results from the YMR117C screen were compared to results 
obtained by dideoxy sequencing of all 108 DNA clones. Nineteen of twenty-two independent 
loci were identified by hybridization, with no false positives. Based on analysis of the 
hybridizing array elements, we were also able to identify the region of the gene present in 
each insert (Table 2). 

The three loci that were not detected by array hybridization were either not 
represented on the array or were resistant to PCR amplification. One of the undetected 
inserts, YLR276c, was difficult to amplify by PCR and could only be sequenced after plasmid 
rescue. The other two undetected inserts start within two hundred bases upstream of the 3' 
end of the gene, in region only covered by one or no probes. Therefore, the signal for these 
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genes was not recognized as significant because there was not a consistent pattern of 
hybridization extending across multiple probes. 
EXAMPLE 2 

To further demonstrate this method, a two-hybrid screen for the gene YMR138w 
5 was also carried out and analyzed by array hybridization. YMR 1 38w (CIN4) is a gene in 
which mutations cause supersensitivity to the antimicrotubulc drug benomyl, as well as 
increased rates of chromosome loss (12). YMR138w is homologous to the ARFl-class of 
small GTP-binding proteins, but a distinct role in microtubule function is not yet known. The 
complete results for this screen are listed in Table 1 . 

10 Plasmids and strains 

For the YMR138w screen, the yeast strains used were the Y190 and Y187 cyh2 R 
marked derivatives of Y159 and Y153, respectively. The library was a yeast cDNA library 
fused to the transcriptional activation domain of GAL4 (gift of S. Elledge, Baylor College 
of Medicine). The bait vector pTS434 was constructed by cloning CIN4 into pASl-CYH2 

1 5 (Clontech) as a NcoI-BamHI fragment. 
YMR138W Two-hybrid screen 

Y190 containing pTS434 was transformed with cDNA library using a lithium acetate- 
based protocol. 5 x 10* transformants were screened by plating on -Ade selective media, and 
1 14 colonies Ade* were selected. All 1 14 colonies were patched onto +Ade plates and lifted 

20 onto BA85 nitrocellulose filters (Schleicher and Schuell) and immersed in liquid nitrogen for 
10s. The filters were then soaked with 3 mis of Z buffer (60 mM NajHPO,, 40 mM NaH 2 
PO4, 10 mM KC1, ImM MgS04, and 50 mM fJ-mercaptoethanol; pH 7.0) containing 0.05 % 
X-Gal. Filters were incubated at 30°C for 6 h and scored for the development of blue color. 
86 clones were positive by a lacZ filter assay. All 86 clones passed testing for solo activation 

25 by streaking strain Y 1 90 carrying the library isolate and pTS434 on -L plates plus 5 ng/ml 
cycloheximide. The strains were confirmed to have lost the TRP-containing plasmid by 
failure to grow on -W media. 81 clones passed testing for specificity by mating strain Y190 
carrying library plasmids with Y187 carrying the negative controls pAS-CDK2, pASlO- 
lamin, pASl-p53, and pASl-rev (a gift of D. Amberg). Library plasmid inserts were 

30 amplified by PCR and the insert junctions with the GAM domain were sequenced and 
precisely identified in the yeast genome using the BLAST program, the Saccharomyces 
Genome Database (http://genome-www.stanford.edu) and the Yeast Protein Database 
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(http://www.proteome.com). In parallel, clones were used to inoculate 200 \xl cultures. 
Saturated cultures were collected, pooled, and processed as previously described. 
Hybridization of DNA to the high-density oligonucleotide array 

DNA products generated from the library plasmid pool were partially DNAsc 1 
5 digested, biotinylated, and hybridized to whole genome arrays. Specifically, arrays were 
prewashed with hybridization buffer (described above) 5 min prior to sample hybridization. 
Following a 5 min incubation at 99°C, the sample was chilled on ice, allowed to return to 
room temperature, and applied to the array, After a 12 hour hybridization at 42°C, the array 
was washed 10 times with 6x SSPE-T. washed with 0.5x SSPE-T for 15 min, and stained 
1 0 with a streptavidin-phycoerythrin conjugate (Molecular Probes) for 1 0 min, all at 42°C. The 
staining buffer contained 6x SSPET, 0.5 mg/ml bovine serum albumin, and 1 mg/mi 
streptavidin-phycoerythrin. The array was washed 5 times with 6x SSPE-T prior to scanning. 
Hybridization patterns were detected by using an argon ion laser to excite phycoerythrin; the 
resulting emission was detected using a photomultiplier tube through a 560 nm bandpass 
1 5 filter (Molecular Dynamics). The entire array was read at a resolution of 7.5 nm in less than 
20 min, generating quantitative signal for each probe element. The collected data was 
analyzed using image and data analysis software (Asymetrix). 

Orientation of genes was determined by hybridization of biotinylated cDNA products. 
All genes identified by array hybridization are listed in Table 1. 
20 Conclusion 

Both two-hybrid screens identified interactors consistent with known results for each 
gene. The previously detected interaction of YMR117c with Prpllp splicing factor has 
suggested that YMR1 17c could have a functional connection with the U2snRNP (4). Several 
of the interactors found in this screen also have known associations with the U2snRNP. For 

25 example, Yml049c has previously been found to interact with the Prp9p splicing factor (4). 
Like CIN4, YPL241c (CIN2) was first isolated as a mutation displaying supersensitivity to 
antimicrotubule agents (12). Mutations in both CIN2 and CIN4 have already been shown to 
be epistatic to mutations in CIN1, a gene implicated in the post-chaperonin folding of yeast 
tubulin (13). However,, these results are the first evidence for a physical interaction between 

30 CIN2 and CIN4 and suggest that they may act as a complex to regulate specific protein- 
folding pathways. Further investigations are needed to establish the biological significance 
of interactions from both screens. 
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Table * Yeast OR Fs identified by array analysis of two-hvbrid screens 



10 



15 



20 

Non-protein 
encoding DNA 
Reverse 
25 orientation 



YMR1I7C 


YMRI38wfCrN4^ 


YBR020w (GAL1) 


YDL117w 


YCL032w (STE50) 


YDR087C 


YCR073c (SSK22) 


YGLI72w(NUP49) 


YDR104c 


YHR141c(MAK18) 


YER018c 


YLR109w 


YER032w(FIRl) 


YNROSOc (LYS9) 


YFR046c 


YPL241c(CIN2) 


YGL197w 




YIL144w 




YLR319c (BUD6) 




YLR419w 




YML049c 




YMR224c(MREll) 




YULIoc 




YUL34w 








YPROlOc (RPA135) 




YPR145w (ASN1) 






18s and 25s rRNA 


YNL291c 


YBR189w 




YDR381w 




YNL301c(RP28B) 




YNR035c 




YOL056w fGPVm 



30 



ORF loci and names are listed for genes detected by array hybridization of PCR products 
derived from end-products of a two-hybrid screen. Because inserts in the non-coding 
orientation comprise a significant proportion of false positives in the two-hybrid screen, 
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RNA was transcribed from the upstream T7 promoter and used to generate exclusively 
antisense cDNA strands with reverse transcriptase. cDNA products were then biotinylated, 
fragmented, and hybridized as described, Genes detected by double stranded DNA 
hybridization but absent in cDNA hybridization arc considered to be in reverse orientation. 
5 Control experiments were performed to confirm that this method is orientation-specific (data 
not shown). 



Table 2 Comparison of sequencing and hybridization for clone 5 f ends 



10 


ORF name 


ORF sizefnri 


5' end bv sequencing 


5' end array pr^bfi 




YBR020w 


1584 


1151 


1164 




YCL032w 


1038 


131 


168 




YDR104c 


3735 


3230 


3234 




YER032w 


2775 


1808 


1860 


15 


YFR046c 


1083 


4 


114 




YGL197w 


4461 


3974 


4092 




YML049c 


4083 


2597 


2616 




YMR224c 


2076 


531 


566 




YOLO 18c 


1191 


257 


324 


20 


YOL034w 


3279 


620 


669 



ORF name, ORF size, and the 5' ends of identified genes, determined either by sequencing 
or array hybridization, for 10 clones from the YMR117c screen. For genes sequenced 
multiple times as different inserts, the end of the most 5' clone is listed. The 5' end as 

25 detected by array hybridization indicates the most 5' nucleotide of the most 5' probe detected 
as positive. Small disparities between sequencing and hybridization are the result of insert 
5' ends falling in between probes on the array. Although array hybridization does not confirm 
that inserts are in frame with respect to the start codon, previous work has shown that 
frameshifting events generally lead to production of protein regardless of the precise fusion 

30 junction between gene insert and transcriptional activation domain (1 1). 
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CLAIMS 

1 . A method for identification of a polynucleotide comprising the steps of: 

a) subjecting a polynucleotide of interest to a two-hybrid screening method; 

b) subjecting the polynucleotides selected at step a) to a hybridization 
reaction onto a matrix substrate onto which oligonucleotide or polynucleotide probes have 
been immobilized. 

2. A method for identifying a polynucleotide encoding a first polypeptide, said first 
polypeptide being able to interact with a second polypeptide of interest, comprising the steps 
of: 

a) providing a recombinant host cell containing a detectable gene, wherein 
the detectable gene expresses a detectable polypeptide when the detectable gene is activated 
by an amino acid sequence including a transcriptional activation domain; 

b) providing a first chimeric gene that is capable of being expressed in the 
host cell, the first chimeric gene comprising a DNA sequence that encodes a first hybrid 
polypeptide encoded by a given prokaryotic or eukaryotic organism, said first hybrid 
polypeptide comprising: 

(i) the transcriptional activation domain; and 

(ii) a first test polypeptide that is to be tested for interaction with the 
second test polypeptide; 

c) providing a second chimeric gene that is capable of being expressed in the 
host cell, the second chimeric gene comprising a DNA sequence that encodes a second 
hybrid polypeptide, the second hybrid polypeptide comprising: 

(i) a DNA-binding domain that recognizes a binding site on the 
detectable gene in the host cell; and 

(ii) a second test polypeptide that is to be tested for interaction with at 
least one first test polypeptide; 

wherein interaction between the first test polypeptide and the second test 
polypeptide in the host ceil causes the transcriptional activation domain to activate 
transcription of the detectable gene; 

d) introducing the first chimeric gene and the second chimeric gene into the 

host cell; 
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e) subjecting the host cell to conditions under which the first hybrid 
polypeptide and the second hybrid polypeptide are expressed in sufficient quantity for the 
detectable gene to be activated; 

0 selecting the host cell clones for which the detectable gene has been 
expressed to a degree greater than expression in the absence of interaction between the first 
test polypeptide and the second test polypeptide; 

g) optionally pooling the clones that have been positively selected at step 0 

h) amplifying the polynucleotides of interest contained in the clones of step 
0 or g) with a pair of oligonucleotide primers respectively hybridizing with a plasmid 
sequence located at the 5' end of the polynucleotide of interest and with a sequence 
complementary to a plasmid sequence located at the 3' end of the polynucleotide of interest 
coding for the first polypeptide; 

i) hybridizing the amplified polynucleotides obtained at step h) to a matrix 
substrate on which has been bound, at known locations, a plurality of sets of oligonucleotide 
or polynucleotide probes of predetermined sequence, each bound set of oligonucleotide or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism from which the polynucleotide coding for the first test polypeptide 
belongs; 

j) detecting the locations of the polynucleotide hybrid complexes obtained 
at step i) on the matrix substrate; 

k) optionally determining the quantity of each hybrid complex detected at 

stepj). 

3. A method for identifying a polynucleotide encoding a first polypeptide that 
inhibits the interaction between a second polypeptide and a third polypeptide comprising the 
steps of: 

a) providing a recombinant host cell containing a detectable gene wherein 
the detectable gene expresses a detectable polypeptide when the detectable gene is activated 
by an amino acid sequence including a transcriptional activation domain; 

b) providing a first gene that is capable of being expressed in the host cell, 
said first gene comprising a DNA sequence that encodes a first polypeptide encoded by a 
given prokaryotic or eukaryotic organism, and for which its inhibition properties on the 
interaction between a second and a third polypeptide is tested; 
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c) providing a second chimeric gene that is capable of being expressed in 
host cell, the second chimeric gene comprising a DNA sequence that encodes a second 
hybrid polypeptide encoded by a given prokaryotic or eukaryotic organism, said second 
hybrid polypeptide comprising: 

5 (i) the transcriptional activation domain; and 

(ii) a second test polypeptide that interacts with a third polypeptide; 

d) providing a third chimeric gene that is capable of being expressed in the 
host cell, the third chimeric gene comprising a DNA sequence that encodes a third hybrid 
polypeptide, the third hybrid polypeptide comprising: 

10 (0 a DNA-binding domain that recognizes a binding site on the 

detectable gene in the host cell; and 
(ii) a third test polypeptide that interacts with the second test 
polypeptide; 

wherein interaction between the second test polypeptide and the third test polypeptide in the 
15 host cell causes the transcriptional activation domain to activate transcription of the 
detectable gene; 

e) introducing the first gene, the second chimeric gene and the third 
chimeric gene into the host cell; 

0 subjecting the host cell to conditions under which the second hybrid 
20 polypeptide and the third polypeptide are expressed in sufficient quantity for the detectable 
gene to be activated; 

g) selecting the host cell clones for which the detectable gene has been 
expressed to a degree lesser than its expression level in the absence of expression of the first 
polypeptide; 

25 h) optionally pooling the clones that have been positively selected at step g); 

i) amplifying the polynucleotides of interest contained in the clones of step 
g) or h) with a pair of oligonucleotide primers respectively hybridizing with a piasmid 
sequence located at the 5' end of the polynucleotide of interest and with a sequence 
complementary to a piasmid sequence located at the 3' end of the polynucleotide of interest 
30 coding for the first polypeptide; 

j) hybridizing the amplified polynucleotides obtained at step i) to a matrix 
substrate on which has been bound, at known locations, a plurality of sets of oligonucleotide 
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or polynucleotide probes of predetermined sequence, each bound set of oligonucleotide or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism from which the polynucleotide coding for the first test polypeptide 
belongs; 

k) detecting the locations of the polynucleotide hybrid complexes obtained 
at step j) on the matrix substrate; 

I) optionally determining the quantity of each hybrid complex detected at 

step i). 

4. The method according to claim 2 or 3, wherein 

a) some of the polynucleotides obtained at step 0 or g) of claim 2 or at step 
g) or h) of claim 3 are separated and subjected to a DNA amplification reaction with a pair 
of primers wherein at least one of the primers comprises, at its 5* end, a promoter region 
recognized by a specific RNA polymerase; 

b) the resulting amplified polynucleotides of the above step a) are incubated 
in the presence of the corresponding RNA polymerase in an acellular enzyme medium; 

c) the mRNA obtained at the above step b) is incubated in the presence of 
a reverse transcriptase type enzyme; 

d) the cDNA molecule obtained at the above step c) is hybridized to a 
matrix substrate on which has been bound, , at known locations, a plurality of sets of 
oligonucleotides of predetermined sequence, each bound set of oligonucleotide being able 
to hybridize with a specific polynucleotide carried by the genome of the organism from 
which the polynucleotide coding for the first test polypeptide belongs; and 

e) the locations of the polynucleotide hybrid complexes obtained at step d) 
on the matrix substrate are determined and compared with the results obtained from method 
ofclaim2or claim 3. 

5. The method according to any one of claims 1 to 3, wherein the transcriptional 
activator is from GAL4. 

6. The method according to claim 4, wherein the promoter region contained in the 
primer used at step a) is the bacteriophage T7 promoter region and the RNA polymerase 
used at step b) is the bacteriophage T7 polymerase. 

7. The method according to any one of claims 1 to 3 wherein the part of the first 
chimeric gene coding for the first test polypeptide is provided by a DNA library. 
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8. The method according to claim 7 wherein the DNA library has been prepared 
from the genome or from the mRNA of a prokaryotic host. 

9. The method according to claim 7 wherein the DNA library has been prepared 
from the genome or from the mRNA of a eukaryotic host. 

5 10. The method according to claim 1 0 wherein the DNA library has been prepared 

from the genomic DNA of Saccharomyccs cerevisiae. 

U. The method according to any one of claims 1 to 3, wherein the sets of 
oligonucleotide or polynucleotide probes bound to the substrate matrix are designed in such 
a manner that every region of the whole genome of the prokaryotic or eukaryotic host 
1 0 organism is able to specifically hybridize to at least one of said set of oligonucleotide or 
polynucleotide probes. 

12. The method according to claim 11, wherein two sets of oligonucleotide or 
polynucleotide probes bound to the matrix substrate are complementary to adjacent 
sequences in the genome of the prokaryotic or eukaryotic host distant one from each other 

1 S of less than one kilobase. 

13. The method according to claim 12, wherein two sets of oligonucleotide or 
polynucleotide probes bound to the matrix substrate are complementary to adjacent 
sequences in the genome of the host, such that the distance between the sequences is less 
than 500 bases. 

20 14 - The method according to claim 13, wherein two sets of oligonucleotide or 

polynucleotide probes bound to the matrix substrate are complementary to adjacent 
sequences in the genome of the host, such that the distance between the sequences is about 
50 bases. 

15. A polynucleotide molecule that has been obtained with the method according 
25 to any one of claims 1 to 3. 

16. A polypeptide that is encoded by a polynucleotide according to claim 15. 

17. A polypeptide that has been obtained with the method according to any one of 
claims 1 to 3. 

18. A peptide comprising a peptide domain interacting with the second test 
30 polypeptide of interest. 

19. A matrix substrate on which has been bound, at known locations, a plurality of 
sets of oligonucleotide or polynucleotide probes of predetermined sequence, each bound set 
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of oligonucleotide or polynucleotide probes being able to hybridize with a specific 
polynucleotide carried by the genome of the organism from which the polynucleotide coding 
for the first test polypeptide belongs. 

20. A matrix substrate comprising: 

5 a) a plurality of immobilized sets of oligonucleotide or polynucleotide 

probes of predetermined sequence, each bound set of oligonucleotide or polynucleotide 
probes being able to hybridize with a specific polynucleotide carried by the genome of the 
organism from which the polynucleotide coding for the first test polypeptide belongs; 

b) at least one polynucleotide coding for one selected first test polypeptide 
10 being hybridized thereto. 

2 1 . A computer useable medium containing computer readable data related to the 
hybrid complexes formed within a matrix substrate according to claim 19 or claim 20. 
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SEQUENCE LISTING 

<110> INSTITUT PASTEUR 

STANFORD UNIVERSITY 
AFFYMETRIX 

<120> SCREENING INTERACTOR MOLECULES WITH WHOLE GENOME 
OLIGONUCLEOTIDE OR POLYNUCLEOTIDE ARRAYS 

<130> 03495-0160-01000 

<140> 

<141> 

<150> US 09/003,335 and US 09/154,972 
<151> 1998-01-06 and 1998-09-17 
<160> 6 

<170> Patentln Ver. 2.0 
<210> 1 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: adaptor 
<400> 1 

atcccggacg aaggcc 16 
<210> 2 
<211> 13 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: adaptor 
<400> 2 

ggccttcgtc egg 13 
<210> 3 
<211> 1134 
<212> DNA 
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<213> Saccharomyces cerevisiae 
<400> 3 



atgaagctac 


tgtcttctat 


cgaacaagca 


tgcgatattt 


gccgacttaa 


aaagctcaag 


60 


tgctccaaag 


aaaaaccgaa 


gtgcgccaag 


tg tctgaaga 


acaactggga 






tctcccaaaa 


ccaaaaggtc 


tccgctgact 


agggcaca tc 


tgacagaag t 


y y ocj u ci y y 


180 


ctagaaagac 


tggaacagct 


atttctactg 


atttttcctc 


gagaagacc t 


taacataatiti 


240 


ttgaaaatgg 


attctttaca 


ggatataaaa 


gcattgttaa 


cagga ttatt 


tatacaaoa t 


300 


aatgtgaata 


aagatgccgt 


cacagataga 


ttggcttcag 


toaaaactaa 


tatocctcta 

wet c v»» cci 


360 


acattgagac 


agcatagaat 


aagtgcgaca 


tcatcatcgg 


aaaaaaataa 


taacaaaofit 


420 


caaagacagt 


tgactgtatc 


gccggaattt 


atgaccataq 


aaaccccoao 


oatccaaaoff 


480 


aacgcaagag 


ccatgtcaca 


aaaggataac 


ctactcgaca 


atccggttga 


a ttt ttaaaa 


540 


gaggtcagag 


aaagttttga 


tattcagcaa 


gatgttgatg 


cca tgaaaag 


aatccaacac 


600 


gatcttgatg 


ttataaaaga 


ggaaagcgaa 


gcaagaatta 


gtaaagagca 


ttcaaaaa tt 

W W W £4 U ^* V4 *4 W w 


660 


tctoaatrna 

i> v» ^ a u <m> ^ a 




gaaugcggaa 


agaataaatg 


ttgctaaatt 


ggagggagac 


720 


ttagaatata 


ctaacgaaga 


gagcaatgag 


tttggtagta 


aagacgaact 


agttaaactt 


780 


ctgaaagatt 


tggacggatt 


ggaacgtaat 


attgtgtcac 


ttcgaagtga 


attggacgaa 


840 


aagatgaaat 


tgtacctcaa 


agatagtgaa 


ataatatcca 


caccgaacgg 


ttccaaaata 


900 


aaagcaaaag 


taattgaacc 


tgagctggaa 


gaacaaagtg 


cggtcacccc 


ggaagcaaac 


960 


gaaaatattc 


taaaattgaa 


gctatacaga 


tctttaggag 


ttattttgga 


tttagaaaat 


1020 


gatcaagtcc 


ttattaacag 


aaaaaatgat 


gggaatattg 


atattttacc 


cttggacaat 


1080 


aacctcagcg 


atttctataa 


gaccaaatac 


atctgggaaa 


gattaggaaa 


gtga 


1134 



<210> 4 
<211> 378 
<212> PRT 

<213> Saccharomyces cerevisiae 
<400> 4 

Met Lys Leu Leu Ser Ser He Glu Gin Ala Cys Asp He Cys Arg Leu 
15 10 15 

Lys Lys Leu Lys Cys Ser Lys Glu Lys Pro Lys Cys Ala Lys Cys Leu 
20 25 30 

Lys Asn Asn Trp Glu Cys Arg Tyr Ser Pro Lys Thr Lys Arg Ser Pro 
35 40 45 
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Leu Thr Arg Ala His Leu Thr Glu Val Glu Ser Arg Leu Glu Arg Leu 
50 55 60 

Glu Gin Leu Phe Leu Leu He Phe Pro Arg Glu Asp Leu Asp Met He 
65 70 75 80 

Leu Lys Met Asp Ser Leu Gin Asp He Lys Ala Leu Leu Thr Gly Leu 
85 90 95 

Phe Val Gin Asp Asn Val Asn Lys Asp Ala Val Thr Asp Arg Leu Ala 
100 105 HO 

Ser Val Glu Thr Asp Met Pro Leu Thr Leu Arg Gin His Arg He Ser 
115 120 125 

Ala Thr Ser Ser Ser Glu Glu Ser Ser Asn Lys Gly Gin Arg Gin Leu 
130 135 140 

Thr Val Ser Pro Glu Phe Met Ala Met Glu Ala Pro Gly He Arg Arg 
145 150 155 160 

Asn Ala Arg Ala Met Ser Gin Lys Asp Asn Leu Leu Asp Asn Pro Val 
165 170 175 

Glu Phe Leu Lys Glu Val Arg Glu Ser Phe Asp He Gin Gin Asp Val 
180 185 190 

Asp Ala Met Lys Arg He Arg His Asp Leu Asp Val He Lys Glu Glu 
195 200 205 

Ser Glu Ala Arg He Ser Lys Glu His Ser Lys Val Ser Glu Ser Asn 
210 215 220 

Lys Lys Leu Asn Ala Glu Arg He Asn Val Ala Lys Leu Glu Gly Asp 
225 230 235 240 

Leu Glu Tyr Thr Asn Glu Glu Ser Asn Glu Phe Gly Ser Lys Asp Glu 
245 250 255 

Leu Val Lys Leu Leu Lys Asp Leu Asp Gly Leu Glu Arg Asn He Val 
260 265 270 

Ser Leu Arg Ser Glu Leu Asp Glu Lys Met Lys Leu Tyr Leu Lys Asp 

275 280 285 

Ser Glu He He Ser Thr Pro Asn Gly Ser Lys He Lys Ala Lys Val 
290 295 300 

He Glu Pro Glu Leu Glu Glu Gin Ser Ala Val Thr Pro Glu Ala Asn 
305 310 315 320 

Glu Asn He Leu Lys Leu Lys Leu Tyr Arg Ser Leu Gly Val He Leu 
325 330 335 

Asp Leu Glu Asn Asp Gin Val Leu He Asn Arg Lys Asn Asp Gly Asn 
340 345 350 

He Asp He Leu Pro Leu Asp Asn Asn Leu Ser Asp Phe Tyr Lys Thr 
355 360 365 



Lys Tyr He Trp Glu Arg Leu Gly Lys Glx 
370 375 
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4 

<210> 5 
<211> 47 
<212> DNA 

<213> Saccharomyces cerevisiae 
<400> 5 

gaattgtaat acgactcact atagggaggt gatgaagata ccccacc 47 
<210> 6 
<211> 54 
<212> DNA 

<213> Saccharomyces cerevisiae 
<400> 6 

agatgcaatt aaccctcact aaagggagac ggggtttttc agtatctacg attc 54 
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